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Abstract We present the results using the AutoClass analysis application avail- 



able at NASA/Ames Intelligent Systems Div. (|2002p which is a Bayesian, finite 



mixture model classification system developed by ICheeseman and Stutzl p996p 
. We apply this system to Mount Wilson Solar Observatory (MWO) intensity 
and magnctogram images and classify individual pixels on the solar surface to 
calculate daily indices that are then correlated with total solar irradiance (TSI) 
to yield a set of regression coefficients. This approach allows us to model the 
TSI with a correlation of better than 0.96 for the period 1996 to 2007. These 
regression coefficients applied to classified pixels on the observed solar surface 
allow the construction of images of the Sun as it would be seen by TSI measuring 
instruments like the Solar Bolometric Imager recently flown by IFoukal et al.\ 
P004p . As a consequence of the very high correlation we achieve in reproducing 
the TSI record, our approach holds out the possibility of creating an on-going, 
accurate, independent estimate of TSI variations from ground-based observations 
which could be used to compare, and identify the sources of disagreement among, 
TSI observations from the various satellite instruments and to fill in gaps in the 
satellite record. Further, our spatially-resolved images should assist in charac- 
terizing the particular solar surface regions associated with TSI variations. Also, 
since the particular set of MWO data on which this analysis is based is available 
on a daily basis back to at least 1985, and on an intermittent basis before then, 
it will be possible to estimate the TSI emission due to identified solar surface 
features at several solar minima to constrain the role surface magnetic effects 
have on long-term trends in solar energy output. 
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1. Introduction 

Observations of solar surface magnetic and intensity data reveal a myriad of 
features which evolve on a wide range of spatial and temporal scales. These 
features are in turn related to variations in solar radiative output both in specific 
wavelengths and in total solar irradiance (TSI) ( [Harvey, 1994[ ). One challenge 
to ascertaining the relationship between particular solar surface features and 
variations in solar radiative output is the very wealth of data available both from 
individual solar observations and from archives of individual solar observations 
which in some cases, such as those available at Mount Wilson Solar Observatory 
(MWO), cover many decades. Traditionally, such features were identified manu- 
ally and categorized by experts into familiar categories such as sunspots, faculae, 
quiet sun, etc., a process which inevitably produces considerable variation in 
how, and into which, feature categories regions from solar images are classified. 
Recently, researchers have applied automated Bayesian statistical pattern recog- 
nition systems to identify and characterize solar features, using expert knowledge 
of the magnetic and radiative characteristics of the recognized types of regions to 
guide the software ( |Turmon, Pap, and Mukhte^ [20021 [T9981 [Pap et al.\ [T997|) . 
Our approach has a similar goal but uses classification based on individual 
pixel properties instead of pattern recognition based on spatial configurations of 
groups of pixels. 

An automated classification or pattern recognition approach is particularly 
promising for analyzing the large database of solar observations taken with the 
150-foot solar tower at MWO which has been in daily operation since 1912. Since 
the mid-1980s, the observations at MWO have included daily magnetograms 
and intensity grams taken in the 5250.2A Fe i and 5237.3A Cr ii fines. The 
observing system has been discussed in a number of papers ([Howard et ail 119831 
lUlrich et aL[ll99Tl [2002. 2009j and utilizes a Babcock magnetograph system that 
scans an entrance aperture over the solar surface following a boustrophedonic 
pattern to build up a full disk image. Because the magnetic and intensity images 
are captured simultaneously, each pixel in a magnetic or intensity image based on 
a particular spectral line corresponds exactly to the solar surface area covered by 
the corresponding image pixels in other lines. This process permits simultaneous 
observation of the same part of the solar surface in different wavelengths. 

One product of the observations is the magnetic field strength for which we 
use the absolute value of the measurement at A5250: |i?A525o|- We used only 
absolute magnetic field values (measured in gauss) for the magnetograms since, a 
priori, the sign of the field did not seem likely to affect the intensity of radiation 
from that pixel. For simplicity of notation we will use \B\ for this parameter 
since this is the only field we use in this study. It is important to recognize that 
magnetic field strength values are dependent on a variety of factors including the 
spectral line and spectral sampling as well as the spatial resolution. The values 
given for the classifications discussed in this paper are those from the 150- foot 
solar tower synoptic program without correction to a A5233 scale as discussed 
by lUlrich et al.\ ([2009P . A second product from the A5250 and A5237 fines is 
an intensity-like image which we have dubbed an intensity ratio-gram. These 
consist of solar image arrays for which each pixel i has a value /r; that is the 
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ratio of the A5250 intensity /a5250 in that pixel to the A5237 intensity /a5237 in 
the same pixel: 

^ 10000 (/a5250//a5237), 
(-fA525o/^A5237)peak 

where the intensity ratio-gram values are based on an arbitrary scale in which 
10000 is the value of the peak of each image's probability distribution function for 
Iri. Prior to use in the classification analysis, the /r images have been adjusted 
to remove the dependence on center-to-limb angle p by fitting to a polynomial in 
powers oi X ~ [1 — cos(p)] up to but with the term in X^ omitted. The fitting 
is done twice with intermediate steps 1) to identify and remove plages and spots 
using a histogram analysis and 2) to remove a small E/W gradient probably due 
to a telluric feature in the A5250 line. This flattening is carried out on each image 
prior to summing into the daily average image. Also, to avoid some observational 
deficiencies near the solar limb, we have restricted our analysis to pixels having 
sinp < 0.95. Because the magnetic sensitivities of the lines differ, the intensity 
ratio produces an image in which the usual surface features-spots, taeniae, etc — 
are visible. The advantage to the intensity ratio-gram is that, by taking the ratio 
of the two intensities, short term variations in seeing and atmospheric conditions 
can be greatly reduced or removed, yielding an image which is superior in terms 
of solar surface feature resolution to an image in either wavelength alone. 



2. Use of AutoClass to Classify Solar Surface Features in MWO 
Magnetogram and Intensity Ratio-grams 

2.1. The AutoClass Software 

AutoClass is a classification code developed by ICheeseman and Stutzl p996p 
based on a classical finite mixture model utilizing Bayesian statistics for de- 
termining the optimal class to which a particular observation belongs. The 
observed parameters are referred to as attributes and can be quite general. Each 
observation is called an instance and is described by the attributes. For us the 
attributes consist of two quantities: the absolute value of the magnetic field and 
the intensity ratio. Additional observed quantities could be added if available 
but for our current effort we have retained just these two. The instances are the 
individual pixels. These instances (data pairs for a single pixel) are treated in 
isolation from all other attributes such as the image to which they belong. We did 
explore the addition of the center-to-limb position as an additional parameter 
but found it did not improve the performance of the classification. Although 
the fact that pixels belong to specific images in a time sequence does not enter 
into the classification analysis, we do utilize this information to understand the 
interrelations between the classes. 

The classification can be carried out in two different modes. The first mode is 
a search mode in which it conducts a search for the optimal classification scheme 
based on posterior Bayesian statistics applied to a specific set of observed data. 
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The search mode can be (1) unconstrained or open ended, in which AutoClass 
determines both the optimal number of classes and the characteristics of those 
classes or (2) constrained or user directed, in which certain limits, such as the 
number of classes to be found, can be imposed on the search. The output classifi- 
cation consists of a set of classes. Each class is defined by a distribution function 
giving the joint probability of occurance of the attributes for a class member 
instance. The parameters defining the probability distribution functions (pdf 's) 
for each attribute are the primary output from the search mode classification. 
In our case the data used to find the optimal classification scheme consists of 
all the data pairs for pixels on a selected subset of solar images spanning the 
full solar cycle. The second mode is a predict mode in which a user-directed 
classification is carried out by the application of the known classes to a new 
data set to determine the probability each data instance belongs to each of the 
classes. For the MWO data the attributes are specifically the absolute value of 
magnetic field \B\ and the 5250/5237 intensity ratio /r. 

When run in the open ended search mode, AutoClass decides both the optimal 
number of classes into which the data should be divided and the characteristics 
(in terms of observable attributes such as intensity ratio and magnetic field) of 
those classes. The result of running AutoClass in the search mode on a data set 
is an instance by instance (here, pixel by pixel) classification in which each in- 
stance/pixel in the searched dataset is given a weighted probability assignment to 
a subset of the total classes found in the search mode. For example, if the search 
mode found J classes for the entire data set, then a particular instance/pixel i 
with attributes/measurements \B\i and /r;, would be given a set of probabilities 
Pij of belonging to each of the classes j. In some cases, the most likely class will 
have a probability near unity while in other cases two or more classes may have a 
high probability. For many pixels the probabilities for the least likely classes may 
essentially vanish. There is thus no absolute assignment of an instance/pixel to 
a particular class. This mode requires the determination of the number of classes 
J as well as the parameters defining the probability distribution function (pdf) 
for each class so that a search mode run is computationally intensive. 

When run in the predict mode, AutoClass analyzes a new data set using the 
results of the search mode (the value of J as well as the parameters defining the 
pdf for each class) to assign each instance/pixel in the new data set into one or 
more of the J classes found in the search mode and to assign a probability to 
each class assigned to that pixel. In this analysis, no new classes are introduced 
and AutoClass assumes that any pixel in the new dataset can be categorized into 
one or more of the J classes found in the search mode. The product of Auto- 
Class as run in the predict mode consists of the assignment of class membership 
probabilities pij to every pixel. 

2.2. Apphcation of AutoClass to MWO Data 

To run AutoClass in the search mode using MWO images, we needed to use 
a reduced-size data set in order to achieve an acceptable running time. For 
this purpose we created a dataset consisting of 12 daily image pairs (MWO 
magnetogram; intensity ratio-gram), using days spread over the period 1996 to 
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2008 — i.e., for each year in the time period, we selected one day and for that day 
used a magnetogram and intensity ratio-gram for that day. In order to reduce 
noise, we (1) used images consisting of the average of the magnetograms and 
the average of the intensity ratio-grams for each selected day, (2) eliminated 
days for which there was only 1 image and (3) selected days for which there 
were at least 8 pairs of magnetic and intensity images. (The number of available 
images per day and type varies from to 16 or more depending on weather and 
time of year.) When assembled, this dataset contained measurements for 484,000 
pixels from 12 pairs of magnetogram and intensity ratio-gram images over 12 
years, each pixel having a measured value for the absolute magnetic field and 
the intensity ratio. 

As a ground-based observatory at a single site, the MWO system must cope 
with a variety of difficulties - weather-induced gaps, sky transparency variations 
and image blurring known as seeing. Apart from the observations where the 
sky transparency variations impacted the data, the variations of atmospheric 
seeing and other instrumental effects did not degrade the image quality since the 
entrance aperture has a size of 12" squared and the seeing image fluctuations 
are generally only 2 to 4 " on days of poor seeing. The effect of sky transparency 
variations is felt primarily through their introduction of tracking errors into the 
system controlling the position of the solar image. At a low level some unsat- 
isfactory images have been left in the data set. Individual daily classification 
data points have been examined when their results appear discordant and the 
tracking error mechanism appeared to have an impact on the data quality. Since 
these points tend to occur on days when one or two observations were made, the 
easiest way to reduce their impact on the analysis was to restrict the data points 
to those days on which three or more observations were made. The most adverse 
months are January to April. For some years, the end of December also tends to 
have poor data but this happened less than was the case for the later 4 months. 
We return to this discussion below in the context of a detected discrepancy 
having a period of a year. In a further note about systematic effects related to 
the observing system we point out that prior to winter 1993 the spectrograph 
experienced strong spectrograph seeing during periods of cold weather with the 
result that the magnetic fields were noiser at those times. None of the presently 
used data is subject to this problem. 

A core problem is to determine the number of classes required to repre- 
sent solar surface features. AutoClass can make this determination based on 
its Bayesian statistical model of the data pairs. However, our observational 
parameters come from a system where the spatial resolution does not isolate 
solar surface features that are uniform in their characteristics. Consequently, 
it is probable that some classes are a result of a variable filling factor. For 
example if {\B\y't , Ityt) and {\B\Q,IrQ) are the observational parameters re- 
spectively in a flux tube and the quiet sun and a particular solar surface pixel has 
(|i?|,/r) = /(|i3[FT, ^rpT) + (1 — /)(I^Iqi -frq), AutoClass may create separate 
classes for each of a limited range of / between and 1 whereas in fact there are 
only two classes present. In the event that AutoClass finds excess classes from 
this mechanism, the extra classes will form a subspace where the \B\,It: pairs are 
linearly dependent and the regression analysis will yield large, nearly cancelling 
coefficients. 
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AutoClass was first run in the search mode with tlie number of classes to 
be used either fixed or left as a free parameter to be determined by the code. 
The initial AutoClass search results yielded 37 classes. The regression analysis 
discussed in the following section indicated that the filling factor mechanism 
might in fact be playing a role in the MWO data set. Consequently we restricted 
the AutoClass search to have a maximum of 18 classes. Although the ability of 
the 37-class classification to reproduce the TSI was slightly better than that for 
18 classes, we believe that this result is not physically significant. Furthermore, 
the 18-class set is easier to relate to known solar features. 

Using the search mode results for the 18-class set, we then ran AutoClass in 
the predict mode on the daily average magnetograms and intensity ratio-grams 
for the period from May, 1996 to July, 2008. For purposes of running AutoClass 
in predict mode, we modified the operation sequence so that the images for 
each day n were treated as a single dataset containing a (|_B|,/r) measurement 
for each pixel. We indicate that pixel i is found in image n with the notation 
i €z n. Designating the total number of satisfactory days we analyze as N, the 
current data set has N = 3111. The predict mode run takes /r,j) for each 

i G n and returns a set of the 10 most probable classes pij for these pixels. 
The restriction on the number of returned class probabilities per instance is a 
feature of the AutoClass package necessitated by the potentially large number 
of classes that AutoClass can identify. We were not able to modify this feature. 
Because this set of probabilities is truncated, the sum of the probabilities of 
the returned classes does not reach unity and a renormalization of the class 
probabilities was necessary in order to ensure that the area index defined below 
properly represents the total solar surface. 

The next step was to use the new datasets created by AutoClass in the predict 
mode to create for day n an index for each class j: 



where /„ is the number of pixels in image n. Thus, the derived index Ajn is the 
expected fractional area of the solar surface covered by class j on day n. Also, 
for each i £ n we retain the position Xi, yi as the distance from the image center 
in the EW and NS directions relative to the image radius. We use these positions 
to reconstruct a 2 dimensional image showing the locations of the different class 
pixels. These images help in the identification of the classes with solar surface 
features of recognized type such as sunspots, plages, faculae and chromospheric 
network. 



3. Use of AutoClass Indices to Model TSI 

It is important to observe that, so far in the process, there has been no introduc- 
tion of any variables other than: (1) the measured intensity ratio and magnetic 
field values from the MWO images and (2) the AutoClass class probabilities and 
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daily class indices, the latter of which were based solely on the observed number 
distribution functions of the data pairs (|_B|,/r) with each pixel being treated 
as an independent entity. The creation of the daily class index from the sum 
over pixels associated with each image is a separate step outside the AutoClass 
system. These indices are the fractions of the area of the apparent solar disk 
covered by each class and having the property: 



j-i 

3=0 



(n= 1---3111) . (2) 



We are also interested in averages over all the daily indices and use a bracket 
notation 



TV 



n=l 

to indicate such an average with a similar notation for other quantities. 

In order to determine whether the daily indices Aj^ could be related to 
satellite TSI observations and solar surface features, we next performed a linear 
multiple regression to fit the 18 daily MWO/ AutoClass indices for 3111 days 
between May, 1996 and July, 2008 to 3 daily TSI measurements for the same 
period from: (1) TSIy, obtained from the VIRGO experiment on the coopera- 
tive ESA/NASA mission SoHO, ( |VIRGO Team, 2008^ (2) TSIc, the composite 
VIRGO dataset (version d40) obtained from PMOD/WRC, Davos, Switzerland, 
unpublished data from the VIRGO Experiment on the cooperative ESA/NASA 
Mission SoHO (the creation of which is described in lFrohlichll2000p : and (3) TSIr 
the space absolute radiometric reference (SARR) ( [Crommelynck et ai, 1995[ 
Dewitte et ai, 2004). In this paper we illustrate the method using the TSIy 



case while a subsequent paper will discuss the application of our methods to all 
three TSI time series as well as UV indicators. The above symbols will be used 
whenever we refer to the time series data. 

The regression fit we use contains only an overall constant and does not 
associate a constant with each class since a class with zero area must contribute 
nothing to the TSI. Thus we seek coefficients Sj giving a model S for the virgo 
TSI, TSIy, of the form: 

using the method of least squares. Each of the coefficients Sj corresponds to the 
TSI value the sun would produce in case it were covered entirely by a surface of 
class j. We can evaluate cq by averaging over all the n daily cases: 

,7-1 

(S) = co + Y.s,{A,) . (5) 
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Since the Sj have the vahie of S that would prevail if Aj = 1 and Sj {Aj ) is the 
average contribution of class j to S, we see that Sj{Aj) must sum to (S) and 
the value of cq must be zero. 

We can use the fact that the area indicics must sum to unity to introduce an 
arbitrary offset since any arbitrary constant C subtracted from the TSIy would 
result in an equal constant subtracted from all the coefficients sj: 

J-l J-l J-l 

Sn — C ~ ^ Sj Ajn — C ^ Ajn = ^ {Sj — C)Ajn (6) 
j=0 j=0 3=0 

where the requirement of all areas summing to unity is used in the first form of 
the right hand side of this equation. 

Thus the inclusion of a constant just provides a zero point offset for fitting 
parameters and is not constrained in the fit. If we drop one class from the 
model equation, then a non-zero constant will have value equal to the TSIy for 
the dropped class. We do not use this approach because we find that all the 
derived values of Sj depend on which class is dropped. To avoid introducing a 
dependence on which class is dropped from the fit, we have adopted a bootstrap 
method applied to the solutions without a constant as described below. 

For most classes, the values of Sj are close to the average TSI so we adopt 
the arbitrary zero point offset to be the global average of the TSIy and consider 
deviations: 

5s, = s, - {S) (7) 

and 

J-l 

SSn = Sn-{S) = Y, ^-^J ■ (8) 

For our current investigation concentrating on the VIRGO data the difference 
between the observation and simulation for day n is then: 

<5TSI„ = TSIy„ - Sn . (9) 

We have chosen to define (5TSI„ with the above a sign to simplify our discussion 
of the time dependence of 6TSI near solar minimum in Section [6] 

We find that the regression analysis is sensitive to the details of the data 
set to which the regression is applied. This may be due to the constraint on 
the sum of the Ajn indicies ( [Golan, Judge, and Miller, 1996[ ) or due to the near 
linear dependence of the observed data pairs brought about by the filling fac- 
tor effect. Although the quality of the fit is consistent when small changes are 
made to the fitted data, the coefficients themselves vary. We determine the most 
representative values of the coefficients from a bootstrap approach: 

1. The base pool of data consists of the actual set of 31 11 daily observations 
each of which is the 18 class indices combined with the TSIy for that day. 
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2. A modified data set is formed by a method called randomly selecting with 
replacement from the full base pool 3111 times. The modified data set contains 
some daily observations more than once and omits others. The temporal 
order of the daily observations is not preserved in the modified data set. 
However, including the TSIy as part of the daily observation prevents the 
fitting coefficients from being impacted by this temporal scrambling. 

3. The fitting coefficients are computed from the modified pool and recorded. 

4. Steps 2 and 3 are repeated up to 100,000 times to create distribution functions 
for each of the classes. 

5. The median from each distribution function is adopted as the appropriate 
value for that class. 

This method allows us to estimate the uncertainty in the fitting coefficients. As 
an additional measure of the uncertainty in the derived fitting coefficients we 
applied the method of dropping one class for a 17-class fit. Since the constant 
in that case represents the TSIy for that class, we subtracted the global average 
TSIy from that constant so it could be treated the same way as the remaining 
classes. Also, to avoid special focus on any particular class, we carried out this 
analysis for all of the classes to obtain 18 values for the fitting coefficient for 
each of the 18 classes. We use the average of coefficients from these 18 cases as 
an estimate we term the 17-class fit coefficients. We then compared the average 
of the coefficients for the 18 cases to the coefficients found from the bootstrap 
method. For all classes the coefficients derived in these two ways agreed with a 
difference much smaller than the error indicated from the bootstrap method. The 
worst cases were for classes 13, 17, 15 and 16 where the bootstrap coefficients 
are —0.53, —324, 54 and 20.9 W m~^ while the 17-class fit coefficients are 
-4.58, -327, 56 and 19.3 W m'^ respectively. 

The correlation between the MWO daily class indices and the VIRGO TSI 
measurement for the 3111 day period was 0.9565 for a fit to the 18 classes while a 
fit with the full 37 classes yielded 0.9625. Figure [T]showes the scatter plots for the 
MWO model, 5", and the VIRGO TSI, TSIy measurements. The tightness of the 
relationship between the model and observations is evident. In fact it is somewhat 
better than is indicated by the full set of cases since there is a long-term trend 
of the model relative to the observations. The color of the plotted points has 
been chosen so as to highlight this effect with the early part of the decade being 
green and the ending part of the decade being orange. The cause for this offset is 
not known since there have been no changes to the MWO instrumentaton that 
could give such a systematic drift. Figurc[2]showes comparative time series for the 
MWO fitted simulations and the observed VIRGO TSI data. If we average over 
the first 6-year period and compare it to the last 6-year period we find that the 
value of S has decreased by only 0.07 Wm~^ whereas the observed VIRGO TSI 
has decreased by 0.31 Wm~^ for the same periods. The spectrographic system 
at MWO has been held constant over these periods. 

The deviation plots show patterns with a period of a year. This is illustrated 
in Figure [3] which shows the result of a superposed epoch analysis with a one 
year period starting at the first of the year. We do not have an explanation 
for this pattern. We also see the systematic trend of the observed points to fall 
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Figure 1. The relationship between observed and modeled values of the TSI during the period 
1996 to 2008. The period analyzed coincides with solar cycle 23. During this interval, the short 
term variations are well followed but there is also a long-term trend for which the model 
deviates from the observations. To make this effect apparent, we have divided the period into 
a first half colored brown and a second half colored green in order to illustrate the smaller 
dispersion over the shorter periods. 



relative to the simulation for the full time period. The simulation is essentially 
the same for the two minima. There is a possible effect related to the tilt angle of 
the magnetic field vector in the polar regions. However, the symmetry between 
north and south regions leads to the expectation that such an effect would have 
a dominant component with a period of 6 months. Occasional bad observations 
that are not eliminated by our quality control procedures tend to have larger 
than appropriate magnetic fields and their associated model TSI values tend 
to be high. Since such images are more common during the winter and spring 
months, this effect would produce a result out of phase compared to what is 
shown in Figure [3) 

The datasets created by running AutoClass in the predict mode can also be 
used to create resolved images of the sun as it would be seen by a telescope which 
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Figure 2. This figure composed of three sets of two time series comparing the observed 
VIRGO TSI: TSlYn (orange solid lines) to the simulation, Sn (the crosses). The bottom of 
each pair shows the values of the observed and simulated TSI's while the top of each pair shows 
the deviation: simulated minus observed TSI values (note that <5TSI is defined as observed minus 
simulated). The full time series has been divided into three sets of 5 years with an overlap of 
one year between successive sets so that patterns can be seen near the dividing points. 
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Figure 3. The annual pattern of deviation in the simulation of the VIRGO TSI relative to 
the observations. Each point is the result of binning the (5TSI values according to the fraction 
of a year past the beginning of each year. For the full data set, each bin which is l/20th of 
a year has between 87 points in February and 214 points in July. The error bars shown are 
the errors of the mean for each bin. Prior to binning, the time series was linearly dctrended so 
that the long-term trend would not increase the estimate of the errors of the mean. 

detected TSI directly. Thus, because each dataset created in the predict mode 
contains for each pixel the probabilities for the 10 most probable classes for that 
pixel, and since each predicted class has a fit coefficient derived from the multiple 
regression of the daily MWO indices against the various TSI measurements, it 
is possible to create a solar TSI image by assigning to each pixel a value equal 
to the sum of the per class coefficients times the class probabilities plus the 
constant derived from the multiple regression. Images of this sort are discussed 
below for groups of classes rather than for individual classes. 

4. Class Groups 

4.1. Cross-Correlation Analysis 

The solar surface is covered by a number of recognized types of feature that 
typically include quiet sun, chromospheric network, plages/faculae and sunspots. 
The 18 (or 37) types of solar surface identified by AutoClass clearly makes a finer 
distinction of solar surface features than is currently recognized. In addition, we 
are concerned that the filling factor effect continues to generate artificial classes. 
AutoClass itself provides no guidance beyond the grouping of pixels into the 
classes. We utilize the additional information provided by the time series of 
daily indices to search for groupings among the classes. Thus if we find several 
classes that have highly correlated daily indices we can determine how this group 
as a whole might relate to the recognized solar surface features. Furthermore, 
the values of \B\j and Iij determined by AutoClass are a significant indicator 
of the nature of each class. 
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Table 1. Class groups and their eross correlations 

Each cell represents the correlation coefficient between the two classes at the ends of the row 
and column intersecting at the cell. The lines mark out the classes that have been identified 
as groups with the group symbol either left or above the classes making up the group. The 
correlations between classes within each group are higher than with classes outside the group. 
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Our grouping of the classes is aceomplished through the study of the cross- 
correlation matrix formed from the 18 time scries of the daily class indices. 
AutoClass creates a class number according to its internal processing and gen- 
erally gives low class numbers to those classes having the largest number of 
members. However, the class numbers themselves have no apparent useful in- 
formation. The procedure we use to find class groups is to identify a subset of 
the classes that can serve as root classes for their group. Based on a tentative 
identification of a class as a potential root class, we can arrange all the remaining 
classes in order of decreasing cross-correlation coefficient. All classes having a 
cross-correlation coefficient above some threshold are then identified as being 
members of that group. The root class for the next group can be taken as the 
class having the highest cross-correlation coefficient below the threshhold. This 
process can be repeated until all classes are members of some group. In some 
cases, the group has only one member if its root class has no other classes with a 
cross-correlation coefficient above the threshold. This stepwise grouping process 
is simple and yielded good results. More complex problems might benefit from 
the global approach detailed in [Banerjee, El Ghaoui, and dAspremont| ()2008p , 
which decomposes the covariance into groups using a maximum-likelihood cri- 
terion based partly on the inverse covariance matrix where the group structure 
is directly encoded. 

The lowest number class found by AutoClass has the largest number of pixels 
and examination of the values of |-B| and /r strongly suggest that this class 
corresponds to the quiet sun. As long as we follow the above procedure of 
beginning each group by taking the root class to be the class with the highest 
cross correlation below the threshhold from the previous group, the sequence of 
groups will represent a systematic increase in the level of activity. In fact the 
values of the \B\j and Irj determined by AutoClass for the classes within these 
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Table 2. Properties of class groups 

This table gives the principal properties of the classes and groups. The classification parameters 
\B\j and Irj are in gauss and are 10000 X (intensity ratio) respectively. The 5sj are from the 
bootstrap/regression analysis of the TSI Virgo observations and are given in Wm~^ deviations 
from the average of the TSI as given by Equation [7] The center-to-limb distances of the classes 
and groups are shown here using the averages (rj) of pijr = Pij sin(p) over all pixels. 
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groups confirms this expectation. Tlie value of the tlu'eshhold cross-correlation 
coefficient to define the dividing point between successive groups could not be 
held constant for all the groups. This process allows us to find six groups of 
classes that we name Quiet (QO), Quiet 1 (Ql), Network (N), Plage (PO), 
Plage 1 (PI) and Spot (S). 

The result of the above procedure is shown in Table [TJ For most of the groups 
the assigned classes are clearly distinctly associated with that group and there 
is little flexibility in making the assignment. However, for the two plage groups 
PO and PI, the association is less clear. For example, if the root class of group 
PO were chosen as class C5 instead of class C13 then class C12 would have been 
assigned to PO instead of PI. However, based on the magnetic field strength, it 
is more appropriate in group PI. The number of classes within each of the above 
groups is 3, 1, 1, 6, 5 and 2 respectively. The fact that the Plage groups have 
the largest number of classes as members is probably a consequence of the filling 
factor mechanism. 

4.2. Class Group Properties 

The relationship of the classes and groups to solar properties is of great interest 
and is shown in Figure [3] and Table [D Although the classes in the groups are 
similar, their regression coefficients are often of different sign within the groups 
suggesting that there may be more parameters than needed for the basic fit. 
However, the uncertainties derived from the bootstrap analysis indicate that 
sign of the coefficient, especially for the PO and PI groups, is not significant. 
The concentration of the classes with the least certain fitting coefficients in 
these two groups is supportive of the hypothesis that uncertainty in the fit 
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Figure 4. This figure shows the values for \B\j and Irj along with the Gaussian widths 
determined by AutoClass for the 18 classes. In order to have a uniform spacing of the \B\j 
values, they are plotted in a logarithmic format. For these the Gaussian width is shown as 
0.4343 5 |-B|/|-B|. The membership of each class in its assigned group is indicated by the color 
code. For several of the classes in group PO, the properties overlap and are not clearly displayed. 
This is consistent with our conclusion that some of the classes are in fact the consequence of 
the filling factor effect. 



comes from the flux-tubc/filling factor mechanism discussed in Section [221 The 
systematic increase in magnetic field strength in going through the sequence (QO, 
Ql, N, PO, PI, S) is evident. The intensity ratio is an indicator of the brightness 
of the upper photosphere compared to the low photosphere with larger values 
indicating a brighter upper photosphere/chromosphere zone. Its dependence on 
the magnetic field strength is mostly along a well defined curve where the upper 
atmosphere is brighter where the field is stronger. Exceptions occur within the 
PO and PI groups where the dependence is erratic. In the case of the Ql group 
(a single class) the chromosphere is brighter than other portions of the solar 
surface with similar field strength. The average areas given in the last column 
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for each group are averaged over the full solar cycle and show that over 80% 
of the solar surface is covered by the QO, Ql and N groups. Nonetheless, these 
groups have low cross-correlation coefficients and represent distinct components 
of the solar surface. 

A useful diagnostic for the classes and groups is their distribution over the 
solar surface. We select only those pixels whose probability of membership in a 
single class j is over 90% and utilize their center-to-limb angles pij as measured 
by rij — sin{pij). For each image n we calculate the average of PijVij to get 

a measure of the aVGrRge pixel distclIlCe (^"n)aven — {Pij'^ij^avc n I An where the 

notation (a)avc n denotes an average over the pixels in image n. Since we are 
interested in the distribution of the class pixels relative to the analyzed image, we 
rescale the values of by dividing by 0.95. The daily averages arc then averaged 
over the full data set to yield (rj). We have also considered the rms spread in 
Tij denoted by Ar^ . For reference we note that a quantity distributed uniformly 

1/2 

over the solar image would have (rj) = 2/3, (Aj-j) = (1/2 — 4/9) ~ 0.236. 
This analysis clarifies the distinction between class 16 and the other PI and PO 
classes with similar magnetic field strength - class 16 is systematically nearer 
the limb than the others and has a higher value of /r as a consequence. 

Much of the variability dominating the cross-correlation coefficients comes 
from the rotational modulation of the solar surface coverage. Changes due to 
the 11-year cycle are also of interest. The growth or decline of the fractional 
coverage of a particular class or group of classes may provide a clue as to the 
underlying process of the solar dynamo. The relative sizes of the coefficients 6Sj 
allows the groups PI and S to have a major impact on the TSI even though the 
areas themselves may be small. We reduce the rotational modulation in the areas 
and TSI model by smoothing all parameters with a Gaussian function having a 
width of 30 days. We show in Figures [5] and [6] respectively the smoothed trends 
for the contributions to the TSI and the changing breakdown of the class areas 
within three groups of classes. Several of the groups influence the TSI but show 
only a restricted dependence on the solar cycle: QO, Ql and N. The case of 
the PI group is interesting because the D panel of Figure [5] suggests that class 
C8 accounts for most of the TSI variation. However, all classes within the PI 
group contribute a nearly constant fractional area to the PI group and their 
contributions to TSI differ primarily because of the differing values of their Sj 
coefficients. The substantial cancellation effects amoung the Sj coefficients within 
the PI group are typical of an underdetermined system having more parameters 
than constraints. The least square fitting exploits the small differences between 
the class time dependencies through these cancellations. 

Because the QO group occupies a major fraction of the available surface area, 
when the PO and PI groups increase their area of coverage, the QO and Ql 
groups decrease their area because they provide the areas supplying the PO and 
PI groups. During the rising portion of the solar cycle, the conversion of the quiet 
group areas into plage group areas is associated with a rising TSI. Consequently 
some of the increase in the TSI is attributed to a negative coefficient for the QO 
and Ql groups. 
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Figure 5. The smoothed time trends of the classes and class groups. The time dependences 
have been smoothed with a Gaussian with a width of 30 observations. The top figure, panel 
A, shows the areas, the second panel B shows the contributions to the TSI. In both cases we 
show the classes grouped according to the system outlined above. The third and fourth panels 
C and D show the components of two of the groups: that for the QO group in panel C and that 
for the PI group in panel D. 
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Figure 6. The smoothed time trends of the fractional area the classes within a class group 
occupy relative to the total area of that class group. 



5. Images 

The relationship between the observable parameters and the derived change in 
the model TSI has both expected and unexpected parts. In this section we mostly 
show a number of representations of the solar image for a representative day of 
16 May 2002. We also use data obtained on 9 Sep. 2003 with one image from 
MWO and another image from the balloon-borne Solar Bolomctric Imager (SBI) 
instrument ( [Foukal et ai, 2004| . 16 May 2002 has a moderate level of activity 
and not a high level of activity wherein the regions might cover more of the 
solar surface and tend to merge or interact. It is not unusual in any obvious 
way and is of good quality data not impacted by terrestrial sky transparency 
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Figure 7. The absolute magnetic field and intensity ratio data images along the top row and 
the derived model TSI image on the lower left. The lower right image is from the Meudon 
synoptic spectroheliogram program and shows the intensity in the K3 line of Call and is 
shown for comparison to more traditional measures of solar magnetic activity. The MWO and 
Meudon images are for May 16, 2002 but are not temporally synchronized so that magnetic 
contours from MWO do not match this Call K3 image although the temporal changes from 
MWO to Meudon are not large. 



variations. Figure [7] shows images of the absohite value of the magnetic field 
and the intensity ratio along the top row. For comparison on the lower right 
is a familiar representation of the chromospheric structure as measured by a 
Ca K3 spectroheliogram adapted from the synoptic program carried out at the 
Paris Observatory at Meudon. This K3 image is displayed here as a negative in 
order to be more consistent with the magnetic and intensity ratio images that 
have been created entirely digitally. The close correspondence between magnetic 
field strength and the K3 intensity is well known. The intensity ratio has not 
been used before as a diagnostic of solar surface features. However, the image 
shown in Figure [7] is very similar to that for \B\ with two differences: 1) the 
strong magnetic field regions have a reduced value of the intensity ratio, and 2) 
in regions of very weak magnetic field, the intensity ratio continues to become 
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Figure 8. This figure compares the spatially resolved image from the Solar Bolometric Imager 
(SBI) obtained on 9 Sep. 2003 by IFoukal et oD l|2004|l to a model TSI image prepared 
according to the procedures described here. Both images are on a greylevel scale where black 
is 95% and white is 103.8% of the average. The SBI data has been corrected for limb darkening 
and was provided to us by the authors in a digital format so that we could create the image 
with the grey levels matching ours to illustrate the key differences compared to our TSI image. 
Note that the images are shown with the same angular scale and that the SBI image goes to 
the solar limb whereas our image only goes to a point 5% in from the solar limb and has a 
smaller image. 

lower even though the field itself is lower by only a very small amount. When 
these features are classified and converted to a contribution to the TSI, a map 
of the solar surface can be created. The result for the sample day is shown in 
the lower left. Note that the positive contributions to the TSI arc not as widely 
distributed as are the magnetic and intensity ratio features. 

The balloon-borne Solar Bolometric Imager (SBI) has provided us with a 
real image of the sun of the sort that our algorithm attempts to produce. The 
comparison between the real SBI image and our model is shown in Figure [H 
The image based on the class coverage fractions is not the local photometric 
contrast but instead is the local contribution to the change in the TSI. The 
SBI image shows the local contrast wherein the deviations near the limb are 
enhanced relative to ours. The model TSI image differs from the observed in the 
important characteristic that it has placed an excessive fraction of the irradiance 
deviation in a bright rim around the plage rather than leaving it distributed 
somewhat uniformly in the plage or facular regions. This pattern is most likely 
a consequence of the ring effect discussed above in Section 14.21 and is related to 
the ill-determined coefficients in the plage groups. 

The grouping of the classifications in Section 14.21 is based exclusively on the 
cross correlations between the time series created using the fact that the pixels 
are from solar images. The names of the groups imply an association with known 
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Figure 9. Images for 16 May 2002 formed from the classes using a CMY encoding. In each 
image a class or a group of classes is assigned one of the three colors so that for the upper left 
image representing the breakdown of the QO group, class CO is assigned the cyan color, class 
C4 is assigned the magenta color and class C6 is assigned the yellow color. Each color is fully 
saturated when the probability of the pixel belonging to that class is 100% and is white when 
the probability is 0%. Multiple colors are combined when the probability is shared among 
several classes. The contours shown in orange on the figures arc lines of constant \B\ starting 
with 2G then 7G, 20G, 70G, 200G and 700G. 

solar structures although some of the groups are subdivisions of these previously 
known structures. The sequence (QO, Ql, N, PO, PI, S) is ordered in terms of 
increasing magnetic field. The values of the field in each case is consistent with 
the fields for these structures except perhaps for the N group (the single class 
CI) and the PO group where the fields are weaker than expected. An important 
diagnostic of the nature of these groups comes from the distribution over solar 
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images and the spatial proximity of pixels of each class or group to other classes 
and groups. 

We have created diagnostic images from the class groups using a CMY (Cyan, 
Magenta, Yellow) encoding wherein a one to three color tiff image has each of 
the color strengths based on one or more class group probabilities - each pixel 
is fully the assigned color if its probability of membership in the assigned class 
is 100%. Pixels having a significant probability of being in two or three classes 
displayed in the figure have their colors composed of those class value and can 
take on an appropriate intermediate hue. Class group images created this way 
are shown in Figure[9]for QO, Ql, N and (PC, PI). The three classes making up 
the QO group are shown using the full CMY encoding, the plage groups are both 
shown in one figure using two colors while the remaining two groups have only 
one class and are show separately using the cyan color alone. The S group is not 
shown in a panel because the areas involved are very limited and can be seen 
adequately as the blue and green areas on the /r^ panel of Figure [T] In order to 
aid in the coordination of these class group figures with the input magnetic field 
pattern, an overlay of contours of constant \B\ strength values is plotted on top 
of the class group plots. 

The PI magenta areas are clearly very concentrated about the high magnetic 
field positions while the QO areas clearly avoid the magnetic contours. The 
gradation of the classes within the QO group is clearly a sequence with the 
quietest CO class forming the center of the areas and being surrounded by the 
C4 and C6 classes in succession. The group identified as N is widely distributed 
as expected for the solar chromosphcric network justifying our naming of this 
group. The Ql group appears to occur preferentially near the solar limb and this 
is probably the reason it is not well correlated with the other classes that have 
been grouped as QO. 

Referring back to Figure [71 it is evident that the plage areas have an alter- 
nating ring pattern where the central portions of the plage are not bright but 
are surrounded by a rather bright ring. This pattern is not seen in either the 
\B\i or the Iri figures. According to Tablc[2l the PI group has both positive and 
negative Sj coefficients with the area weighted average being positive. In Figure[7] 
the Si panel is computed from the full set of classes rather than the class groups. 
In the parts of the figure with strong active regions characterized by sunspots or 
concentrations of PI areas, this alternating ring pattern is seen quite clearly. In 
more isolated areas which only have PO or PI areas, the alternating ring pattern 
is not present. This pattern comes primarily from the way the Sj coefficients have 
been constrained in the fit to the TSI. The most strongly positive coefficient is 
that for class 8 while the stronger magnetic field classes 10 to 12 have small or 
negative coefficients. An examination of the spatial distribution of the class 8 
pixels shows that they are found both associated with the plage regions and in 
areas well away from the stronger plage zones. Evidently the widely-distributed 
pixels are essential for the fit to the TSI and lead to a high Sj. Apparently 
this large coefficient yields a higher-than-needed contribution from the stronger 
plage regions so that an additional contribution from the class 10 to 12 pixels is 
not needed to match the TSI. The close spatial proximity of the class 8 pixels 
and the class 10 to 12 pixels makes this cancellation effective in reducing the 
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contribution to the TSI variation. The extended pattern of smaher brighter and 
dimmer regions outside the active areas is a combination of contributions from 
the Ql, N and PO groups. 

6. The Solar Minimum Slope Difference 

We discussed above the inabihty of AutoClass-assigned areas to reproduce the 
long-term trend of decreasing TSI as measured by Virgo - the Virgo TSI has a 
greater rate of decrease than is produced by solar surface magnetic features. Our 
above conclusion could be a consequence of the tight constraints on coefficient 
values imposed by the large amplitude variations in the TSI due to the mod- 
ulation of stationary plage and sunspot regions carried by rotation across the 
apparent solar surface. By temporally smoothing the time series to eliminate 
the rotational modulation of the TSI record, the least square fitting may be 
freed to fit the long-term trend mismatch. We have tested this supposition by 
carrying out a set of modified fits using both smoothed indices and the smoothed 
Virgo TSI data. The degree of smoothing is defined by the Gaussian width 
parameter a which we have taken to be 3, 31 and 101 successive points. Our 
points are separated by one or more days since observing conditions do not 
permit data acquisition every day so that the actual smoothing period varies 
but the number of observations smoothed does not. After smoothing, the data 
sets were subsampled in order to retain only independent time sequence points 
and the regression fit using the bootstrap approach was repeated. To determine 
whether these smoothed fits permit a better representation of the long-time 
trends, we define two temporal averaging periods: Ave22m(^) and Ave23m(^) 
which are 12-month averages of X at the beginning of our series May, 1996 to 
April, 1997 and August, 2007 to July, 2008 respectively Recalhng that 5TSI is 
the observed minus model, we define a solar minimum slope difference 5SMS as: 

5SUS = [Ave23m(<5r5/)- Ave22m(<5r5/)]/(t23m-t22m) . (10) 

For the Virgo observations the solar minimum slope is — O.OSOWm^^yr^^. The 
smoothing and resampling reduces the number of constraints on the model so 
we considered three sets of class groups: 

6 variables - The class groups discussed above. 

7 variables - The class groups discussed above except for the QO case which was 

split so that class CO was one group and classes C4 and C6 were considered 
a second group. 

18 variables - All the original classes were considered as independent. 

In Table[3]we report the values of (55 MS for six cases defined by the above three 
numbers of variables and by the two smoothing range. An 18-month averaging 
interval for the periods at the beginning and end of the series was tried and yield 
essentially the same results as given in Table [3l We find that both smoothing 
and reducing the number of fitting variables make (5SMS more negative. As an 
additional indication of the effect of the trend on our fits, we evaluate the Pearson 
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Table 3. Properties of fits 

(5SMS (Wm-2yr^^) 
cr (days) 



Vars. 
6 
7 

18 



none 



31 



-0.025 
-0.023 
-0.023 



-0.024 
-0.023 
-0.022 



-0.028 
-0.026 
-0.024 



rp with trend 
cr (days) 
none 3 31 
0.888 0.926 0.909 
0.890 0.928 0.909 
0.953 0.958 0.925 



rp dctrcnded 
a (days) 
none 3 31 
0.898 0.941 0.927 
0.897 0.940 0.926 
0.963 0.971 0.931 



correlation coefficient rp for these cases both with and without the trend between 
the two solar minima. For the detrended cases, we adjust the observed values 
according to: 

5TSI-h0.0254Wm-2yr-i(t- 2002.0) — > 5TSI (11) 

where the notation indicates that the revised 5TSI is the result of the addition 
left of the arrow. The improvement in the value of rp for the 3-point smoothed 
case is a result of the reduction of noise in the MWO model due to observa- 
tional factors. Our highest value of rp = 0.971 indicates a remarkable degree of 
success in the method's ability to reproduce the observed variations in the total 
solar irradiance. The scatter diagram relationship between the model and the 
observations is shown in Figure 1101 

These results strengthen our earlier conclusion that the long-time trend in 
the difference between the model and the observations is not a consequence of a 
change in the magnetic coverage of the solar surface. If this discrepancy were to 
be explained by changes in the solar radius and/or surface temperature it would 
require a decrease of about 80 milli arc-sec and/or 0.3°K over the spanned period 
of 11.25 years. The observed radius change is opposite to what might be required 
to explain the slope excess or too small to influence our result dKuhn et al, 2004[ 



[Djafer, Thuillier, and Sofia, 2008a||Djafer et al., 2008b[ ). The temperature change^ 
is difficult to measure directly and is closely related to the TSI data such as we are 
discussing here ( |Fazel et al., 2005| . Since our analysis already includes the effect 
of the changes in surface coverage by different magnetic classes, this additional 
change must come from the temperature of one of the quiet classes - what has 



been referred to by Livingston and Wallace (|2003p as the immutable basal 



temperature. The rate of change in this immutable basal temperature required 
by the (5SMS is small compared to the amphtudes these authors discussed. 



7. Discussion 

The problem of identification and measurement of different parts of the solar 
surface along with the determination of their relationship, if any, with vari- 
ability of the solar energy output has long been recognized and studied, see 
for example the reviews by ILeanl)19881 I1997P . The strength and geometry of 
magnetic fields is the underlying factor that distinguishes one portion of the 
solar surface from others. Work by |Skumanich, Smythe, and Frazier| ()1975p 
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Figure 10. The relationship of modeled values of the TSI versus the 3-point smoothed and 
detrendcd observed values of the TSI during the period 1996 to 2008 using the same format 
as Figure [T] 



showed that there is a direct relationship between the magnetic field strength 

and the strength of emission in the radiation near the core of the Call K 

spectral line. Spectroheliograms utilizing this spectral band have been used 

to identify facular and sunspot regions and estimate their impact on the TSI 

dWhite and Livingston, T978l[SkImanich et aZ.,T984l[Fligge et al, 1998l|Harvey and White, 1999| . 

The He I line at A 10830 A has also been used to estimate the impact of faculae 

and network on the TSI ( Foukal and Lean, 1988[ ). Although the strength of the 



magnetic field is assumed to be the underlying cause, these papers have all used 
the proxy of the Call K emission or the Hel A10830A emission as an index to 
estimate the field strength. The study by IWenzler et al.\ p006p has used mag- 
netic fields from the NSO-512 and the NSO-SPM at the KPVT to model the TSI 
variations in solar cycles 21 to 23. The spatial resolution of these magnetograms 
is 1.14" X 1.14" and the magnetic field measurements have a noise level of 7.5 
G to 9 G. This study also identified spots using continuum images. For a period 
from 1992 to 2003 these authors obtained an optimized reconstruction of the TSI 
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having Tc = 0.94. The identification of features from their geometry and contigu- 
ity on the solar surface is a common goal of all these investigations. While direct 
inspection is sometimes used, it is desireable to have a system that carries out this 
identification automatically so that potential investigator bias can be removed. 
Two studies have implemented such automatic pattern recognition and clas- 
sification systems ( [Turmon, Pap, and Mukhtar, 2002} | Jones et ai, 2008| . The 
generally recognized features are sunspots, faculae and the network. Of these 
the identification and measurement of the network has been most problematic. 
Our approach differs from previous investigations in several ways: 

1. The classification completely abandons all image-based information. All pixels 
are treated equally. 

2. We use a new type of data based on a line intensity ratio. This quantity is 
sensitive to atmospheric properties not previously included - the brightness 
in the upper photosphere/lower chromosphere. 

3. Our magnetic field measurement has a noise level of about 0.2 G ( jUlrich et al, 2009D | 
for the images used that are averages of 4 to 15 individual magnetograms. 

The resulting classification has allowed us to reproduce the observed TSI with 
rp reaching 0.97 for the case where the Virgo observations have the long-term 
trend removed and some of the MWO noise is reduced by smoothing with a 
3-point wide Gaussian. This high coefficient indicates that we need to take the 
classification seriously even though there are a number of deficiencies that are 
evident from the comparison to the SBI image. The cancellation effect illustrated 
by the alternating sign of the coefficients for individual classes among the groups 
we have called PO and PI are indicative of an inadequacy of the classification 
- probably a consequence of our low spatial resolution. However, one strong 
aspect of our study is the low noise for the magnetic field strengths. This is 
partly a consequence of our large size pixels. If a similar level of noise could be 
achieved with pixels with sub-arcsecond size, the classification might be better. 
Another way the classification could be improved is through the addition of 
more measured quantities such as a true continuum intensity that would give 
better identification of sunspot areas. The MWO system does not provide such 
a quantity so its addition will require merging of an external data set wherein 
the temporal and spatial scales will need to be adjusted to match those of the 
MWO data set. Other observational systems that are internally consistent do 
not provide a quantity having the properties of the It ratio we have used. Part 
of the success of our analysis has come from the fact that this ratio is sensitive to 
the upper photosphere instead of the mid chromosphere. However, it is possible 
that other combinations of quantities could be used in the AutoClass system 
with success similar to ours. 

8. Conclusions 

This analysis holds out the possibility of creating an on-going, accurate, inde- 
pendent estimate of TSI variations from the MWO ground based observations 
which could be used to fill in gaps in the satellite record. Comparison of the 
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solar surface features seen in the TSI images created from the various TSI mea- 
surements may shed light on the sources of disagreement between various TSI 
measurements. Further, the spatial resolution of these images should assist in 
identifying with greater accuracy the particular solar surface regions associated 
with TSI variations and could help in addressing questions such as those raised 
by IFoukal and Bernascoml (|2008p . 
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